54 research outputs found

    Optimal Compilation of HPF Remappings

    No full text
    International audienceApplications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines. HPF (High Performance Fortran) provides such remappings, on data that can be replicated, explicitly through therealign andredistribute directives and implicitly at procedure calls and returns. However such features are left out of the HPF subset or of the currently discussed hpf kernel for effeciency reasons. This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. The first phase is global and removes all useless remappings that appear naturally in procedures. The code generated by the second phase takes advantage of replications to shorten the remapping time. It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in HPFC, our prototype HPF compiler. Experiments were performed on a Dec Alpha farm

    Parallelizing with BDSC, a resource-constrained scheduling algorithm for shared and distributed memory systems

    No full text
    International audienceWe introduce a new parallelization framework for scientific computing based on BDSC, an efficient automatic scheduling algorithm for parallel programs in the presence of resource constraints on the number of processors and their local memory size. BDSC extends Yang and Gerasoulis's Dominant Sequence Clus-tering (DSC) algorithm; it uses sophisticated cost models and addresses both shared and distributed parallel memory architectures. We describe BDSC, its integration within the PIPS compiler infrastructure and its application to the parallelization of four well-known scientific applications: Harris, ABF, equake and IS. Our experiments suggest that BDSC's focus on efficient resource man-agement leads to significant parallelization speedups on both shared and dis-tributed memory systems, improving upon DSC results, as shown by the com-parison of the sequential and parallelized versions of these four applications running on both OpenMP and MPI frameworks

    Data and Process Abstraction in PIPS Internal Representation

    No full text
    7 pagesInternational audiencePIPS, a state-of-the-art, source-to-source compilation and optimization platform, has been under development at MINES Paris-Tech since 1988, and its development is still running strong. Initially designed to perform automatic interprocedural parallelization of Fortran 77 programs, PIPS has been extended over the years to compile HPF (High Performance Fortran), C and Fortran 95 programs. Written in C, the PIPS framework has shown to be surprisingly resilient, and its analysis and transformation phases have been reused, adapted and extended to new targets, such as generating code for special purpose hardware accelerators, without requiring significant re-engineering of its core structure. We suggest that one of the key features that explain this adaptability is the PIPS internal representation (IR) which stores an abstract syntax tree. Although fit for source-to-source processing, PIPS IR emphasized from its origins the use of maximum abstraction over target languages' specificities and generic data structure manipulation services via the Newgen Domain Specific Language, which provides key features such as type building, automatic serialization and powerful iterators. The state of software technology has significantly advanced over the last 20 years and many of the pioneering features introduced by Newgen are nowadays present in modern programming frameworks. However, we believe that the methodology used to design PIPS IR, and presented in this paper, remains relevant today and could be put to good use in future compilation platform development projects

    Polyèdres et Compilation

    No full text
    22 pagesInternational audienceLa première utilisation de polyèdres pour résoudre un problème de compilation, la parallélisation automatique de boucles en présence d'appels de procédure, a été décrite et implémenté il y a près de trente ans. Le modèle polyédrique est maintenant reconnu internationalement et est en phase d'intégration dans le compilateur GCC, bien que la complexité exponentielle des algorithmes associés ait été pendant très longtemps un motif justifiant leur refus pur et simple. L'objectif de cet article est de donner de nombreux exemples d'utilisation des polyèdres dans un compilateur optimiseur et de montrer qu'ils permettent de poser des conditions simples pour garantir la légalité de transformations

    PIPS Is not (just) Polyhedral Software Adding GPU Code Generation in PIPS

    No full text
    6 pagesInternational audienceParallel and heterogeneous computing are growing in audience thanks to the increased performance brought by ubiquitous manycores and GPUs. However, available programming models, like OPENCL or CUDA, are far from being straightforward to use. As a consequence, several automated or semi-automated approaches have been proposed to automatically generate hardware-level codes from high-level sequential sources. Polyhedral models are becoming more popular because of their combination of expressiveness, compactness, and accurate abstraction of the data-parallel behaviour of programs. These models provide automatic or semi-automatic parallelization and code transformation capabilities that target such modern parallel architectures. PIPS is a quarter-century old source-to-source transformation framework that initially targeted parallel machines but then evolved to include other targets. PIPS uses abstract interpretation on an integer polyhedral lattice to represent program code, allowing linear relation analysis on integer variables in an interprocedural way. The same representation is used for the dependence test and the convex array region analysis. The polyhedral model is also more classically used to schedule code from linear constraints. In this paper, we illustrate the features of this compiler infrastructure on an hypothetical input code, demonstrating the combination of polyhedral and non polyhedral transformations. PIPS interprocedural polyhedral analyses are used to generate data transfers and are combined with non-polyhedral transformations to achieve efficient CUDA code generation

    Sûreté : de l'analyse à l'instrumentation et à la synthèse de code

    No full text
    Les machines multiprocesseurs, multi-cœurs et les accélérateurs de type GPU se généralisent et pourtant il devient de plus en plus difficile pour les programmeurs de tirer profit de leurs capacités. La compilation source-à-source des applications permet de faciliter le développement d’implémentations efficaces pour ces architectures complexes.Mes travaux de recherche s’inscrivent selon deux axes principaux. Le premier relève de la compilation et de l’optimisation d’applications en vue de leur exécution efficace sur des architectures parallèles. Le deuxième axe relève de l’utilisation de l’algèbre linéaire en nombres entiers pour modéliser les problèmes rencontrés.Cette thèse présente, tout d’abord, les analyses statiques et dynamiques de programmes développées pour vérifier la correction d’un code et faciliter sa maintenance, telles que la mise en conformité par rapport à la norme du langage source, la détection de variables non initialisées, la vérification du non-débordement des accès à des éléments de tableaux et le calcul de dépendances de données. Ces analyses ont été intégrées dans le compilateur PIPS.Puis, des méthodes de génération automatique de code de contrôle et des communications à partir de spécifications sont exposées. Des modélisations du placement des données sur les processeurs, des calculs pouvant être exécutés en parallèle et des communications devant être générées pour conserver la cohérence des données sont proposées. Les algorithmes de synthèse de code spécifiquement développés pour HPF, pour une mémoire virtuelle partagée émulée sur une architecture distribuée et pour des transferts compatibles avec des DMAs sont ensuite détaillées.Finalement, une approche globale du problème de placement d’une application pour une architecture embarquée parallèle, qui utilise la programmation logique concurrente par contraintes comme moteur, est présentée. Une même abstraction a été choisie pour les analyses et la modélisation des problèmes d’optimisation et de génération de code : un système de contraintes linéaires où les variables sont des entiers. Le choix des Z -polyèdres a permis de rester dans un cadre algébrique disposant d’une large gamme de méthodes de résolution, aisées pour les preuves

    Broadcast And Surveillance TechnologIes Over Networks

    No full text
    Mark Verhoeven (Axon), Willem-Jan Dirks (Axon), Egwin Wesselink (Axon), Teun Selten (Axon), Sander van Kolck (Axon),Jochem Herrmann (Adimec), Adriaan Umans (Adimec), Joost van Kuijk (Adimec), Marcel Dijkema (Adimec),Klaas Jan Damstra (GVN), John Hommel (GVN), Robert Pot (GVN), Joost Uijtdehaag (GVN),Patrick Henckes (Caeleste), Bart Dierickx (Caeleste), Bert Luyssaert (Caeleste),Pascal Douine (e2v), Jean-Luc Diverchy (e2v), Alain Prevost (e2v), Philippe Kuntz (e2v),Christophe Guettier (Sagem), Marc Bousquet (Sagem), Romuald Perinelle (Sagem), François Gendry (Sagem),Corinne Ancourt (Armines), François Irigoin (Armines), Claude Tadonki (Armines),Peter Brookes (Altera/Intel)International audienceThe main objective of the BASTION project is to research and develop new applications for the Broadcast Market, and for theSecurity and Surveillance Markets. Both applications will be built on top of the Internet Protocol network, which will allow distributing the applications over several physical sites. The cameras will be located on one or more sites, and the monitoring/control room will be on a separate site. The main benefit of this distribution is that it will enable increasing the efficiency for producing live Broadcast content by a factor of 2-3, by sending only camera personnel to remote sites, and having the main production team in the home studio to do several programs in a single day. Integrating high-quality and high-resolution (HD and higher) image sensors in a networked infrastructure to detect, recognize and identify Surveillance and Security issues when observing long distance or large-scale events

    COLA-Gen: Active Learning Techniques for Automatic Code Generation of Benchmarks

    Get PDF
    Benchmarking is crucial in code optimization. It is required to have a set of programs that we consider representative to validate optimization techniques or evaluate predictive performance models. However, there is a shortage of available benchmarks for code optimization, more pronounced when using machine learning techniques. The problem lies in the number of programs for testing because these techniques are sensitive to the quality and quantity of data used for training. Our work aims to address these limitations. We present a methodology to efficiently generate benchmarks for the code optimization domain. It includes an automatic code generator, an associated DSL handling, the high-level specification of the desired code, and a smart strategy for extending the benchmark as needed. The strategy is based on Active Learning techniques and helps to generate the most representative data for our benchmark. We observed that Machine Learning models trained on our benchmark produce better quality predictions and converge faster. The optimization based on the Active Learning method achieved up to 15% more speed-up than the passive learning method using the same amount of data
    • …
    corecore